An Improved Delta-Centralization Method for Population Stratification

ثبت نشده
چکیده

Dadd et al. [Hum Hered 2010; 69: 285–294] recently criticized our delta-centralization (DC) method of controlling for population stratification (PS) and concluded that DC does not work. To explore our method, the authors simulated data under the Balding-Nichols (BN) model, which is more general than the model we had used in our simulations. They determined that the DC method underestimated the PS parameter ( ) and inflated the type I error rates when applied to BNsimulated data, and from this they concluded that the DC method is invalid. However, we argue that this conclusion is premature. In this paper, we (1) show why is underestimated and type I error rates are inflated when BN-simulated data are used, and (2) present a simple adjustment to DC that works reasonably well for data from both kinds of simulations. We also show that the adjusted DC method has appropriate power under a range of scenarios. Copyright © 2011 S. Karger AG, Basel Received: August 10, 2010 Accepted after revision: March 21, 2011 Published online: July 20, 2011 Prakash Gorroochurn Division of Statistical Genetics, R620 Department of Biostatistics, Columbia University 722 W 168th Street, New York, NY 10032 (USA) Tel. +1 212 342 1263, E-Mail pg2113 @ columbia.edu © 2011 S. Karger AG, Basel 0001–5652/11/0713–0180$38.00/0 Accessible online at: www.karger.com/hhe Improved DC Method Hum Hered 2011;71:180–185 181 distribution, where F ST is Wright’s coefficient of genetic differentiation and p ref is a reference allele frequency that Dadd et al. equated to our population-average allele frequency at the test locus [in ref. 3 ]. In contrast, for the model we used [in ref. 3 ]: (i) the subpopulation allele frequencies at the test locus were pre-specified, and (ii) p ref is equated to our subpopulation-allele frequency at the test locus . The practical consequence of this difference is that, under our original model, the allele frequency at a null locus in a given subpopulation is closer on average to the allele frequency at the test locus for that subpopulation, compared to the BN model. Because of the different beta distributions, Dadd et al. [1] used the terms ‘subpopulation allele frequency matching’ for the matching we performed [in our studies, 2, 3 ], and ‘population-level allele frequency matching’ for the matching they performed. The reason we pre-specified subpopulation allele frequencies at the test locus in (i) above is so that we could fix different values of our PS parameter ( ) and then be able to compare the performances of DC and genomic control (GC) under different levels of PS. It is the PS at the test locus that needs to be corrected, and only by fixing allele frequencies at the test locus were we able to investigate the performance of these methods under different levels of PS. Our original aim was to show that GC fails to control for PS under high levels of PS. Irrespective of the simulations used, this statement remains true and was proved in our earlier paper [2] by using distribution theory. Regarding the second difference in (ii) above, in all of our simulations (see below), we let p ref be the population-average allele frequency at the test locus. We performed two types of simulations. First, we allowed the subpopulation allele frequencies at both the test and null loci to vary according to a beta distribution, with p ref as the population-average allele frequency at the test locus, as in the BN model used by Dadd et al. [1] . Second, we implemented a modified BN model, which shares all the features of the first type of simulations, except that it prespecifies the subpopulation allele frequencies at the test locus. This is done so as to be able to compare the performance of the different procedures under different levels of PS. Both of these simulation procedures allow for extra variability in allele frequencies at the null loci, and correcting for PS under these models requires an adjusted DC method . Before we describe this adjustment, we make an important observation about the simulation results of Dadd et al., as shown in their table 3. Dadd et al. [1] used the BN model to compare the true values of  against their estimated values, ̂ 2006 and  ̂ 2007 . However, as we explained above, the BN model assumes that the allele frequencies at the test locus are generated according to a beta distribution. Since the value of  cannot be fixed under this type of simulation, the computation of  ̂ 2006 and  ̂ 2007 becomes meaningless. In other words, under each replication of the simulation procedure, a different value of  is generated at the test locus, yielding an estimate of this quantity from the null loci. Being an average of these estimates across all replications, neither  ̂ 2006 nor  ̂ 2007 therefore estimates any given value, making the  comparisons in table 3 of Dadd et al. [1] misleading. We now explain why the original DC method we used [in our studies, 2, 3 ], does not work under the BN model, and how the method can be made to work through a simple adjustment of the DC test statistic. This adjusted method performs reasonably well under both simulation models. Consider the hypothetical scenario shown in table 1 . Both null loci 1 and 2 in table 1 match the test locus in genotype frequencies (in the controls) to within a window of 8 0.15, but only the estimated  at null locus 1 has the same sign as the  to be estimated at the test locus. Therefore, should be estimated by using the estimated at the first locus only, not by simply averaging the two estimated values. Thus, if one simply selects all null loci that match without taking into account the sign of the corresponding  , the overall is always considerably underestimated since, at many of the null loci, the estimated s are negative. In our original simulations, the sign ‘mismatch’ hardly ever arose because the subpopulation allele frequencies at all the null loci were fairly close to those at the test locus (as we explained above). The strategy of simply averaging at matched loci performed well for the type of simulations we performed, but is inadequate for more general simulations, such as those Dadd et al. performed. Thus, the adjusted DC statistic is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved delta-centralization method for population stratification.

Dadd et al. [Hum Hered 2010;69:285-294] recently criticized our delta-centralization (DC) method of controlling for population stratification (PS) and concluded that DC does not work. To explore our method, the authors simulated data under the Balding-Nichols (BN) model, which is more general than the model we had used in our simulations. They determined that the DC method underestimated the PS...

متن کامل

Delta-centralization fails to control for population stratification in genetic association studies.

OBJECTIVE To investigate the validity of simulations and assumptions used to underpin the delta-centralization (DC) method for correcting for population stratification in genetic association studies; to assess the effectiveness of DC compared to genomic control (GC) under valid simulation conditions; and to highlight other studies employing similarly flawed simulations. METHODS DC and GC use ...

متن کامل

Centralizing the non-central chi-square: A new method to correct for population stratification in genetic case-control association studies.

We present a new method, the delta-centralization (DC) method, to correct for population stratification (PS) in case-control association studies. DC works well even when there is a lot of confounding due to PS. The latter causes overdispersion in the usual chi-square statistics which then have non-central chi-square distributions. Other methods approach the noncentrality indirectly, but we deal...

متن کامل

An Improved Control Method Based on Modified Delta-Sigma Modulator for Buck Converter

This paper proposes an improved control method based on modified Delta-Sigma Modulator (DSM) to enhance transient response and improve harmonic contents of buck DC-DC converter. The main advantages of the proposed method are improving the output voltage frequency spectrum, correction of the output voltage harmonic contents and sideband harmonics, reduction of switching noise peaks at the output...

متن کامل

An Efficient Algorithm for Workspace Generation of Delta Robot

Dimensional synthesis of a parallel robot may be the initial stage of its design process, which is usually carried out based on a required workspace. Since optimization of the links lengths of the robot for the workspace is usually done, the workspace computation process must be run numerous times. Hence, importance of the efficiency of the algorithm and the CPU time of the workspace computatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011